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METHODS AND CIRCUITS FOR MEASURING CLOCK SKEW ON 
PROGRAMMABLE LOGIC DEVICES 

Siuki Chan 

FIELD OF THE INVENTION 

[0001] This invention relates generally to methods and 
circuit configurations for measuring signal skew in 
programmable logic devices. 

BACKGROUND 

[0002] Programmable logic devices (PLDs) are a well- 
known type of digital integrated circuit that may be 
Q programmed by a user (e.g., a circuit designer) to perform 
J£ specified logic functions. One type of PLD, the field- 
I* programmable gate array (FPGA) , typically includes an array 
| : of configurable logic blocks (CLBs) that are programmably 

M interconnected to each other and to programmable 

< 

m input /output blocks (IOBs) . This collection of 
«j configurable logic is personalized by loading configuration 
p data into internal configuration memory cells that define 
jpl how the CLB s, interconnections, and IOBs are configured. 
For a detailed discussion of an exemplary FPGA, see U.S. 
Patent No. 6,144,220 entitled "FPGA Architecture Using 
Multiplexers that Incorporate a Logic Gate," by Steven P. 
Young, which is incorporated herein by reference. 
[0003] Figure 1 (Prior Art) depicts a conventional FPGA 
100, examples of which include the Spartan™ and Virtex™ 
FPGAs available from Xilinx, Inc., of San Jose, California. 
FPGA 100 includes an array of programmably interconnected 
CLBs 105. FPGA 100 additionally includes a clock 
distribution network 110 that can be connected to internal 
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or external clock sources via a global clock buffer BUFG. 
Many other FPGA resources are omitted from Figure 1 for 
brevity. 

[0004] Manufacturers of PLDs, including FPGAs , would 
like to guarantee the highest speed performance possible 
without their devices failing to meet timing 
specifications. PLD designers therefore measure circuit 
timing as accurately as possible to minimize the guard 
bands required to ensure correct device performance. U.S. 
Patent No. 6,144,262 entitled "Circuit for Measuring Signal 
Delays of Asynchronous Register Inputs," by Christopher 
Kingsley describes circuits and methods of measuring 
circuit timing in programmable logic devices, and is 
incorporated herein by reference. U.S. Patent No. 
5,795,068 entitled "Method and Apparatus for Measuring 
Localized Temperatures and Voltages on Integrated 
Circuits," by Robert 0. Conn describes ring oscillator 
configurations on FPGAs, and is also incorporated herein by 
reference . 

[0005] Clock distribution network 110 includes a source 
spine 11 OS that conveys clock signals to a source node 112 
in the interior of FPGA 100. From there, a horizontal 
spine 110H conveys clock signals to a number of vertical 
clock spines 110V. Finally, a number of clock destination 
branches HOD extend to each CLB 105. Clock distribution 
network 110 can be programmably connected to any of CLBs 
105 via programmable interconnect points. The above-cited 
Young patent describes exemplary programming technologies. 
[0006] Clock distribution network 110 typically includes 
clock buffers 115 placed and sized to minimize clock skew, 
where skew is defined as the difference in path delays from 
clock input GCLK to each of CLBs 105 and any other clock 
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loads, such as embedded blocks of memory and IOBs . Many 
different buffer and conductor configurations are possible, 
the selected implementation depending upon design 
requirements . 

[0007] High-performance clock distribution networks, 
such as network 110, are designed to minimized clock skew. 
The delays inherent in network 110 are typically short 
relative to the delays associated with other FPGA 
resources. The short skew is beneficial from the 
standpoint of performance, but renders difficult the task 
of accurately determining clock skew because conventional 
test circuitry normally introduces more skew than the clock 
distribution network. There is therefore a need for a more 
accurate means of measuring skew on programmable logic 
devices . 

SUMMARY 

[0008] The present invention is directed to a method for 
accurately measuring the skew of clock distribution 
networks on programmable logic devices. Individual clock 
distribution networks are modeled using a sequence of 
delay-element configurations formed on the device using 
configurable logic. Each delay element includes a portion 
of the clock network for which skew is of interest, and 
consequently exhibits a delay that depends, in part, on the 
skew imposed by the portion of interest. The delay through 
each delay element is measured by incorporating the delay 
elements into ring oscillators and measuring the resulting 
period. 

[0009] The various delay-element configurations are 
modeled mathematically as the sum of a series of delays. 
The delay-element configurations are designed so their 
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respective equations can be combined to solve for the delay 
contribution, or skew, of the portion of the clock network 
for which skew is to be measured. The delay associated 
with the portion of interest can then be combined with skew 
measurements for other portions of the clock network to 
more completely describe the network. 
[0010] The claims, and not this summary, define the 
scope of the invention. 

BRIEF DESCRIPTION OF THE FIGURES 

[0011] Figure 1 (Prior Art) depicts a conventional FPGA 
100. 

[0012] Figure 2 depicts an FPGA oscillator configuration 
200. 

[0013] Figure 3 depicts an FPGA oscillator configuration 
300. 

[0014] Figure 4 depicts an FPGA oscillator configuration 
400. 

[0015] Figure 5 depicts an FPGA oscillator configuration 
500. 

DETAILED DESCRIPTION 

[0016] Figures 2-5 schematically depict FPGA 
configurations used in accordance with embodiments of the 
invention to accurately measure global clock skew for clock 
distribution network 110 of Figure 1. In the examples, the 
FPGA is a Virtex™ XCV1000 FPGA, available from Xilinx, 
Inc., which includes an array of 96 columns and 64 rows of 
CLBs, or a total of 6,144 CLBs . The number of CLBs and 
other FPGA resources shown in the figures is limited for 
brevity. 
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[0017] Figure 2 depicts an FPGA oscillator configuration 
200 in which a CLB R2C24 (for row 2, column 24), a CLB 
R17C24, and a feedback circuit 205 are interconnected to 
form a ring oscillator. Circuit 205 and the associated 
connections 215 and 220 are made of available FPGA 
resources and connect to clock distribution network 110 via 
global clock buffer BUFG. The resources interconnected as 
shown using dashed and bold interconnect and clock lines 
form the ring oscillator. 

[0018] The FPGA is programmed (i.e., configured) so the 
global clock buffer BUFG connects to the clock input 
terminal of CLB R2C24 via source spine 110S, horizontal 
clock spine 110H, a vertical clock spine 110V, and one of 
destination branches HOD. The synchronous output terminal 
of CLB R2C24 is programmably connected to an asynchronous 
input terminal of CLB R17C24 via some programmable 
interconnect resources R2->R17, so called because the 
routing connects row 2 to row 17. Finally, an output 
terminal of CLB R17C24 is programmably connected to the 
input terminal of global buffer BUFG via programmable 
interconnect resources 215 and 220 and circuit 205. 
[0019] As oscillator configuration 200 oscillates, the 
oscillation period T 200 provides a measure of the speed of 
the interconnected components. For example, if the average 
period T 20 o of configuration 200 is ten nanoseconds, then 
the average time required for positive- and negative-going 
signal transitions to traverse the ring of components is 
ten nanoseconds. The above- incorporated Kingsley patent 
describes some oscillators for use with the present 
invention. 

[0020] The delay around the path of oscillator 200 is 
the sum of the delays associated with vertical spine 110V, 
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a one-column-long portion of a destination branch HOD, the 
clock-to-out (Clk->Out) delay of CLB R2C24, the 
interconnect delay of net R2->R17, and the combined delays 
K of the delays imposed by CLB R17C24, connections 215 and 
220, circuit 205, buffer BUFG, and source spine 110S. The 
delay analysis can be simplified by assuming nearby CLBs 
exhibit identical clock-to-out (Clk->Out) delays. This is 
a reasonable assumption for identical components formed in 
close proximity. 

[0021] Stated mathematically, the oscillation period T 20 o 
of oscillator configuration 2 00 is: 

T 20 o = 30SK + C + Clk->Out + DTB + K (1) 

[0022] where SK is the skew imposed by spine 110V 
between adjacent clock destination branches HOD, C is the 
delay associated with a one-column- long portion of a branch 
HOD, Clk->Out is the clock-to-out delay of a CLB, DTB is 
the delay encountered by signals traveling from top-to- 
bottom from row 2 to row 17 along net R2->R17, and K is the 
delay associated with that portion of oscillator 
configuration 200 depicted using dashed lines. Nets 
described herein as having identical delays are defined 
using device programming software to establish identical or 
substantially identical routes, and therefore to impose 
identical or substantially identical delays. The process 
or forcing device programming software to select specific 
routing paths is well understood by those of skill in the 
art of defining circuit configurations for programmable 
logic devices. 

[0023] The oscillation period T 20 o of configuration 200 
is generally not, by itself, enough information to 
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determine the delay associated with any one of the 
components of the ring. The FPGA is therefore reconfigured 
to form one or more additional test structures. 
[0024] Figure 3 depicts an FPGA configuration 300 in 
which CLB R17C24, CLB R32C24, global clock buffer BUFG, and 
the identical circuit 205 of Figure 2A are interconnected 
to form a second ring oscillator. CLB R17C24, circuit 205, 
clock buffer BUFG, and the dashed portion of clock 
distribution network 110 and interconnect resources 215 and 
22 0 are identical to the like-identified structures of 
Figure 2; consequently, the sum of the combined delay 
contributions of those dashed elements, U K" in equation 1, 
is identical in oscillator configurations 200 and 300. The 
portions of the oscillators depicted as connected via solid 
lines in the figures can be considered delay elements for 
which the difference in signal propagation delays provides 
a measure of clock skew. Including the delay elements in 
ring oscillators allows for accurate measures of 
propagation delay through the delay elements. 
[0025] The FPGA of Figure 3 is programmed so the clock 
input terminal of CLB R32C24 connects to the output 
terminal of global clock buffer BUFG via a one-column long 
portion of one of destination branches HOD, vertical spine 
110V, horizontal spine 110H, and one of source spines 110S. 
The synchronous output terminal of CLB R32C24 is 
programmably connected to an input terminal of CLB R17C24 
via some programmable interconnect resources R32->R17. 
Finally, as in configuration 200, an output terminal of CLB 
R17C24 is programmably connected to the input terminal of 
global buffer BUFG via programmable interconnect resources 
215 and 220 and circuit 205. The dashed portions of 
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oscillator configurations 200 and 300 are identical, each 
imposing a delay K. 

[0026] Stated mathematically, the oscillation period T 30 o 
of oscillator configuration 300 is: 

T 30 o = C + Clk->Out + DBT + K (2) 

[0027] where C is the delay associated with a one- 
column-long portion of a branch HOD, Clk->Out is the 
clock-to-out delay of CLB R32C24, DBT is the delay 
encountered by signals traveling from bottom-to-top from 
row 32 to row 17 along net R32->R17, and K is the delay 
associated with that portion of oscillator configuration 
3 00 depicted using dashed lines, including the delay 
induced by CLB R17C24. 

[0028] Comparing periods T 20 o and T300 of respective 
configurations 200 and 3 00 provides a measure of the skew 
SK between adjacent destination branches. Subtracting 
equation 2 from equation 1 gives: 

T200 - T300 = (3 0SK + C + Clk->Out + DTB + K) - 
(C + Clk->Out + DBT + K) 

= 3 0SK + DTB - DBT (3) 

Solving for skew SK provides: 

SK = (T 200 - T300 + DBT - DTB)/30 (4) 

[0029] Different programmable logic devices route 
differently. For a given PLD, the values of delays DBT and 
DTB may be close enough to assume they cancel one another. 
This assumption reduces equation 4 to: 
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SK = (T 200 - T 30 o)/30 (5) 

[0030] Thus, if DTB and DBT are equal, periods T 200 and 
T 250 are measures of skew SK. Of course, skew SK can also 
be used to find the skew between non-adjacent destination 
branches HOD; for example, the skew between destination 
branches separated by a row of CLBs would be 2SK. 

[0031] it may be difficult or impossible to route some 
PLDs such that the top-to-bottom connections (e.g., net R2- 
>R17) provide the same delays as the bottom- to- top 
connections (e.g., net R32->R17) . In such cases, equation 
4 cannot be simplified to equation 5. 

[0032] Figures 4 and 5 depict respective oscillator 
configurations 400 and 500, the periods of which provide 
additional data for finding the skew SK between adjacent 
destination branches HOD in the event of unequal top-to- 
bottom and bottom-to- top delays DTB and DBT. As with the 
preceding figures, the dashed and bold lines indicate which 
components form the oscillators. The dashed lines 405, CLB 
R48C24, and feedback circuit 410 are identical circuit 
configurations in both Figures 4 and 5, and their 
equivalent delay contributions are symbolized by a constant 
M. The ring oscillators in each of Figures 4 and 5 can be 
configured as described in the above-incorporated Kingsley 
patent. In the depicted embodiment, CLB R48C24 is 
configured to be an asynchronous inverter, though different 
asynchronous or synchronous configurations might also be 
used. Circuit 410 and the associated connections 405 are 
made of available FPGA resources and connect to clock 
distribution network 110 via global clock buffer BUFG. 
[0033] The FPGA of Figures 2 through 4 is configured 
such that net R33->R48 of configuration 400 (Figure 4) is 
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identical to net R2->R17 of oscillator configuration 200 
(Figure 2) so the delays DTB associated with these nets are 
identical, or nearly so. Likewise, net R63->R48 (Figure 5) 
is identical to net R32->R17 (Figure 3) so the delays DBT 
associated with these nets are identical. 
[0034] Using the same method described above for 
determining the periods associated with oscillator 
configurations 2 00 and 3 00, the respective periods T 40 o and 
T 50 o of oscillator configurations 400 and 500 are: 

T400 = C + Clk->Out + DTB + M (6) 

and 

T500 = 3 0SK + C + Clk->Out + DBT + M (7) 
Subtracting equation 6 from equation 7 gives: 

T500 - T400 = (3 0SK + C + Clk->Out + DBT + M) - 
(C + Clk->Out + DTB + M) 

= 30SK + DBT - DTB (8) 

Solving for DBT - DTB gives: 

DBT - DTB = T 500 - T400 " 3 0SK (9) 

[0035] Oscillator configurations 400 and 500 thus 
provide a measure of the difference in delays between 
bottom- to- top and top- to-bottom programmable 
interconnections between rows of CLBs . 

[0036] The result of equation 9, DBT -DTB, can be used to 
solve for skew SK using equation 4 as follows: 

SK = (T 200 - T300 + T500 - T400 - 30SK)/30 (10) 

or 
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SK = (T 20 o - T 300 + T 500 - T 40 o)/60 (11) 

[0037] Thus, the four oscillator configurations depicted 
in Figures 2-5 collectively provide enough information to 
determine the skew SK between adjacent destination branches 
HOD. 

[0038] Skew measurements between vertical clock spines 
110V may also be of interest, and can be combined with the 
above-described skew measurements to give a comprehensive 
skew analysis for an entire device. Patent Application 

Serial No. entitled * METHODS AND CIRCUITS FOR 

MEASURING CLOCK SKEW ON PROGRAMMABLE LOGIC DEVICES, " by 
Siuki Chan, filed herewith [docket number X-884] , describes 
methods of measuring skew between vertical clock spines and 
is incorporated herein by reference. 

[0039] FPGA components are connected in various ways: 
some components are directly connected, others are 
connected via intermediate components, such as buffers, and 
still others are programnaably connectable, which is to say 
they can be programmably connected via programmable 
interconnect resources. In each instance, components are 
connected to establish some desired electrical 
communication between two or more circuit nodes, or 
terminals. Such communication may typically be 
accomplished using a number of circuit configurations, as 
will be understood by those of skill in the art. 
[0040] While the present invention has been described in 
connection with specific embodiments, variations of these 
embodiments will be obvious to those of ordinary skill in 
the art. For example, multiple embodiments of the above- 
described oscillator configurations can be used 
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simultaneously on devices that include more than one signal 
tree for which skew measurements are of interest. 
Moreover, above-described skew measurements can be done in 
any order, and other columns of CLBs (e.g., column 25 of 
Figures 2-5) could be used to perform skew measurements. 
Therefore, the spirit and scope of the appended claims 
should not be limited to the foregoing description. 
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